SWAT: A New Spliced Alignment Tool Tailored for Handling More Sequencing Errors

نویسندگان

  • Yifeng Li
  • Hesham H. Ali
چکیده

There are several computer programs that align mRNA with its genomic counterpart to determine exon boundaries. Though most of these programs perform such alignment efficiently and accurately, they can only tolerate a relatively small number of sequencing errors. These programs also highly depend on the GT/AG rule in finding splice sites. Both properties make them less desirable in the case of aligning EST reconstructed transcript with genomic DNA to identify splicing variants, where a lot of sequencing errors and noncanonical splice sites are expected. Using a novel heuristic algorithm, we developed a tool that can handle much more sequencing errors. Test dataset results indicated that SWAT (Sequencing-error Well-handled Alignment Tool) has a much stronger error-handling ability than Sim4 and Spidey, two other popular spliced alignment tools. In the presence of up to 10 percent randomly introduced sequencing errors, it can still give the precise number of exons and exon boundaries in most cases. The robustness of SWAT makes it a desirable tool in cases where sequencing error is a concern. A web service is freely available at http://app1.unmc.edu/swat/swat.html.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sim4cc: a cross-species spliced alignment program

Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing ...

متن کامل

GeneSeqer@PlantGDB: Gene structure prediction in plant genomes.

The GeneSeqer@PlantGDB Web server (http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi) provides a gene structure prediction tool tailored for applications to plant genomic sequences. Predictions are based on spliced alignment with source-native ESTs and full-length cDNAs or non-native probes derived from putative homologous genes. The tool is illustrated with applications to refinement of current ge...

متن کامل

Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors

Performance of existing algorithms for similarity-based gene recognition in eukaryotes drops when the genomic DNA has been sequenced with errors. A modification of the spliced alignment algorithm allows for gene recognition in sequences with errors, in particular frameshifts. It tolerates up to 5% of sequencing errors without considerable drop of prediction reliability when a sufficiently close...

متن کامل

DART: a fast and accurate RNA-seq mapper with a partitioning strategy

Motivation In recent years, the massively-parallel cDNA sequencing (RNA-Seq) technologies have become a powerful tool to provide high resolution measurement of expression and high sensitivity in detecting low abundance transcripts. However, RNA-seq data requires a huge amount of computational efforts. The very fundamental and critical step is to align each sequence fragment against the referenc...

متن کامل

RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing

SUMMARY Reduced representation bisulfite sequencing (RRBS) is a powerful yet cost-efficient method for studying DNA methylation on a genomic scale. RRBS involves restriction-enzyme digestion, bisulfite conversion and size selection, resulting in DNA sequencing data that require special bioinformatic handling. Here, we describe RRBSMAP, a short-read alignment tool that is designed for handling R...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005